135 research outputs found

    Scalable Full Flow with Learned Binary Descriptors

    Full text link
    We propose a method for large displacement optical flow in which local matching costs are learned by a convolutional neural network (CNN) and a smoothness prior is imposed by a conditional random field (CRF). We tackle the computation- and memory-intensive operations on the 4D cost volume by a min-projection which reduces memory complexity from quadratic to linear and binary descriptors for efficient matching. This enables evaluation of the cost on the fly and allows to perform learning and CRF inference on high resolution images without ever storing the 4D cost volume. To address the problem of learning binary descriptors we propose a new hybrid learning scheme. In contrast to current state of the art approaches for learning binary CNNs we can compute the exact non-zero gradient within our model. We compare several methods for training binary descriptors and show results on public available benchmarks.Comment: GCPR 201

    Biologically Inspired Vision for Indoor Robot Navigation

    Get PDF
    Ultrasonic, infrared, laser and other sensors are being applied in robotics. Although combinations of these have allowed robots to navigate, they are only suited for specific scenarios, depending on their limitations. Recent advances in computer vision are turning cameras into useful low-cost sensors that can operate in most types of environments. Cameras enable robots to detect obstacles, recognize objects, obtain visual odometry, detect and recognize people and gestures, among other possibilities. In this paper we present a completely biologically inspired vision system for robot navigation. It comprises stereo vision for obstacle detection, and object recognition for landmark-based navigation. We employ a novel keypoint descriptor which codes responses of cortical complex cells. We also present a biologically inspired saliency component, based on disparity and colour

    The brightness clustering transform and locally contrasting keypoints

    No full text
    In recent years a new wave of feature descriptors has been presented to the computer vision community, ORB, BRISK and FREAK amongst others. These new descriptors allow reduced time and memory consumption on the processing and storage stages of tasks such as image matching or visual odometry, enabling real time applications. The problem is now the lack of fast interest point detectors with good repeatability to use with these new descriptors. We present a new blob- detector which can be implemented in real time and is faster than most of the currently used feature-detectors. The detection is achieved with an innovative non-deterministic low-level operator called the Brightness Clustering Transform (BCT). The BCT can be thought as a coarse-to- fine search through scale spaces for the true derivative of the image; it also mimics trans-saccadic perception of human vision. We call the new algorithm Locally Contrasting Keypoints detector or LOCKY. Showing good repeatability and robustness to image transformations included in the Oxford dataset, LOCKY is amongst the fastest affine-covariant feature detectors

    Multiscale Shape Description with Laplacian Profile and Fourier Transform

    Get PDF
    International audienceWe propose a new local multiscale image descriptor of vari-able size. The descriptor combines Laplacian of Gaussian values at dif-ferent scales with a Radial Fourier Transform. This descriptor provides a compact description of the appearance of a local neighborhood in a manner that is robust to changes in scale and orientation. We evaluate this descriptor by measuring repeatability and recall against 1-precision with the Affine Covariant Features benchmark dataset and as well as with a set of textureless images from the MIRFLICKR Retrieval Evalu-ation dataset. Experiments reveal performance competitive to the state of the art, while providing a more compact representation

    VPR-Bench: An Open-Source Visual Place Recognition Evaluation Framework with Quantifiable Viewpoint and Appearance Change

    Get PDF
    Visual place recognition (VPR) is the process of recognising a previously visited place using visual information, often under varying appearance conditions and viewpoint changes and with computational constraints. VPR is related to the concepts of localisation, loop closure, image retrieval and is a critical component of many autonomous navigation systems ranging from autonomous vehicles to drones and computer vision systems. While the concept of place recognition has been around for many years, VPR research has grown rapidly as a field over the past decade due to improving camera hardware and its potential for deep learning-based techniques, and has become a widely studied topic in both the computer vision and robotics communities. This growth however has led to fragmentation and a lack of standardisation in the field, especially concerning performance evaluation. Moreover, the notion of viewpoint and illumination invariance of VPR techniques has largely been assessed qualitatively and hence ambiguously in the past. In this paper, we address these gaps through a new comprehensive open-source framework for assessing the performance of VPR techniques, dubbed “VPR-Bench”. VPR-Bench (Open-sourced at: https://github.com/MubarizZaffar/VPR-Bench) introduces two much-needed capabilities for VPR researchers: firstly, it contains a benchmark of 12 fully-integrated datasets and 10 VPR techniques, and secondly, it integrates a comprehensive variation-quantified dataset for quantifying viewpoint and illumination invariance. We apply and analyse popular evaluation metrics for VPR from both the computer vision and robotics communities, and discuss how these different metrics complement and/or replace each other, depending upon the underlying applications and system requirements. Our analysis reveals that no universal SOTA VPR technique exists, since: (a) state-of-the-art (SOTA) performance is achieved by 8 out of the 10 techniques on at least one dataset, (b) SOTA technique in one community does not necessarily yield SOTA performance in the other given the differences in datasets and metrics. Furthermore, we identify key open challenges since: (c) all 10 techniques suffer greatly in perceptually-aliased and less-structured environments, (d) all techniques suffer from viewpoint variance where lateral change has less effect than 3D change, and (e) directional illumination change has more adverse effects on matching confidence than uniform illumination change. We also present detailed meta-analyses regarding the roles of varying ground-truths, platforms, application requirements and technique parameters. Finally, VPR-Bench provides a unified implementation to deploy these VPR techniques, metrics and datasets, and is extensible through templates

    Evaluation Method, Dataset Size or Dataset Content: How to Evaluate Algorithms for Image Matching?

    Get PDF
    Most vision papers have to include some evaluation work in order to demonstrate that the algorithm proposed is an improvement on existing ones. Generally, these evaluation results are presented in tabular or graphical forms. Neither of these is ideal because there is no indication as to whether any performance differences are statistically significant. Moreover, the size and nature of the dataset used for evaluation will obviously have a bearing on the results, and neither of these factors are usually discussed. This paper evaluates the effectiveness of commonly used performance characterization metrics for image feature detection and description for matching problems and explores the use of statistical tests such as McNemar’s test and ANOVA as better alternatives

    Autonomous Flight and Real-time Tracking of Nano Unmanned Aerial Vehicle

    Get PDF
    This study describes a system in which a micro UAV (quadrotor) was coupled with a Kinect (v2), a Myo armband and an RGB camera. The quadrotor was connected to two PC clients or workstations and communicated through the Robot Operating System. The UAV moved to the marked targets in a cluttered environment without collision using the depth sensor. Recognises faces via the on-board camera based on the frame by frame basis and uses feature-based monocular simultaneous localisation and mapping (SLAM) in real-time. The SLAM tracks the pose of the quadrotor, simultaneously builds an incremental map of the surrounding environment to locate the UAV in that. The Myo armband was employed for teleoperation which commands the quadrotor to start/stop its journey or to begin a new task using hand gestures. The face recognition algorithm was developed using the Fisherface library and pre-trained database. Three missions were assigned to the UAV; to detect the marked area via Kinect's depth sensor, fly towards and hover around the marked area, send the image/video streams to the ground station and to look for the person's face in the crowded environment, match the name with the face owner and follow him/her within the distance of 22 m. Various organisations could use the proposed system for different purposes. It could be utilised for search and rescue, environmental monitoring, surveillance or inspection. It could be used to identify a person in a collapsed building, in urban/suburban areas or to locate people with a particular need (alzheimer or dementia casualties which leads to wandering behaviour)

    Early predictors of impaired social functioning in male rhesus macaques (Macaca mulatta)

    Get PDF
    Autism spectrum disorder (ASD) is characterized by social cognition impairments but its basic disease mechanisms remain poorly understood. Progress has been impeded by the absence of animal models that manifest behavioral phenotypes relevant to ASD. Rhesus monkeys are an ideal model organism to address this barrier to progress. Like humans, rhesus monkeys are highly social, possess complex social cognition abilities, and exhibit pronounced individual differences in social functioning. Moreover, we have previously shown that Low-Social (LS) vs. High-Social (HS) adult male monkeys exhibit lower social motivation and poorer social skills. It is not known, however, when these social deficits first emerge. The goals of this study were to test whether juvenile LS and HS monkeys differed as infants in their ability to process social information, and whether infant social abilities predicted later social classification (i.e., LS vs. HS), in order to facilitate earlier identification of monkeys at risk for poor social outcomes. Social classification was determined for N = 25 LS and N = 25 HS male monkeys that were 1–4 years of age. As part of a colony-wide assessment, these monkeys had previously undergone, as infants, tests of face recognition memory and the ability to respond appropriately to conspecific social signals. Monkeys later identified as LS vs. HS showed impairments in recognizing familiar vs. novel faces and in the species-typical adaptive ability to gaze avert to scenes of conspecific aggression. Additionally, multivariate logistic regression using infant social ability measures perfectly predicted later social classification of all N = 50 monkeys. These findings suggest that an early capacity to process important social information may account for differences in rhesus monkeys’ motivation and competence to establish and maintain social relationships later in life. Further development of this model will facilitate identification of novel biological targets for intervention to improve social outcomes in at-risk young monkeys
    corecore